Voice-based order processing

ABSTRACT

A vehicle is detected at a drive thru terminal and an automated voice-based dialogue is established with a user in the vehicle and the terminal. An order is placed for a transaction using natural-language speech. The automated voice-based dialogue identifies commands, nouns associated with menu items, adjectives associated with modifying the menu items, prepositions associated with modifying the menu items, and exclamations associated with confirming the order and associated menu items. The automated speech is identified and processed based on a specific lexicon associated with a specific type of restaurant. The speech is translated into actions processed through an ordering system/interface as text-based instructions. Feedback for the order and guidance during the order are provided to the user through: speech played over a speaker associated with the terminal, text displayed on a display, and/or images associated with the ordered items.

BACKGROUND

Network-based shopping has steadily increased in recent years within theretail industry. Brick and mortar stores have suffered substantially asa result of this trend because of rent, costs, inventory of product, andstaffing associated with maintaining the physical stores and loss ofsales to online competition. One area where consumers are less likely toengage in shopping over the Internet is fast food, restaurants, andgrocery stores. This is for a variety of reasons, such as consumers wantthese products at the time that they shop for those products and wanttheir food while they are traveling, such that online ordering makeslittle sense.

Employee expense has become an issue for many retailers. This isparticularly true in the fast food and restaurant industries. Many stateand city governments have mandated minimum wage rates that areparticular difficult for the industry to absorb. Customers are unwillingto pay significantly higher prices that would be necessary, in manycases, to cover the new wage rates. As a result, some retailers havebeen forced to reduce employee hours and/or close their businesses.

In the fast food industry, a drive thru ordering service is expected byconsumers. Consumers have become accustomed to eating in their vehiclesor picking up food to take home or to their offices. However, a drivethru usually requires at least one and in some cases two employees thatare not available to prepare food orders and that are dedicated totaking drive thru orders, which these employees then enter into aterminal for order fulfillment processing by other employees. This addsto the number of employees needed to staff the restaurant andaccordingly adds to the retailer's overall employee-associated expenses.With wage rates spiking, many fast food restaurants are finding itparticularly difficult to stay in business and/or make a reasonableprofit on their investment.

SUMMARY

In various embodiments, methods and a system for voice-based orderprocessing are presented.

According to an embodiment, a method for voice-based processing ispresented. More particularly, a voice dialogue is established with auser for an order at a drive thru terminal. Speech provided by user istranslated into text commands associated with an order interface using alexicon associated with a type of restaurant. The text commands areprocessed through the order interface. Speech feedback and guidance areprovided at the drive thru terminal during the order. Finally, the orderis completed at the drive thru terminal for subsequent order fulfillmentprocessing.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagram of a system for voice-based order processing,according to an example embodiment.

FIG. 2 is a diagram of a method for voice-based order processing,according to an example embodiment.

FIG. 3 is a diagram of another method for voice-based order processing,according to an example embodiment.

FIG. 4 is a diagram of another system for voice-based order processing,according to an example embodiment.

DETAILED DESCRIPTION

FIG. 1 is a diagram of a system 100 for voice-based order processing.The various components are illustrated and the arrangement of thecomponents is presented for purposes of illustration only. It is to benoted that other arrangements with more or less components are possiblewithout departing from the coded scan-based item processing teachingspresented herein and below.

The techniques, methods, and system presented herein and below forvoice-based order processing can be implemented in whole or in part inone, all, or some combination of the components shown with the system100. The techniques and methods are programmed as executableinstructions in memory and/or non-transitory computer-readable storagemedia and processed on one or more processors associated with thevarious components.

As used herein the terms “customer,” “consumer,” and “user” may be usedsynonymously and interchangeably.

The system 100 includes a server/Point-Of-Sale (POS) terminal 110, adrive thru terminal/digital sign 120, optionally a voice-enabled networkdevice 130, and a vehicle 140 that optionally includes a mobile device150 associated with a user in the vehicle 140.

The server/POS terminal 100 includes at least one processor, memory, andnon-transitory computer-readable storage medium. The non-transitorycomputer-readable storage medium includes executable instructions for atransaction manager 111, a voice interface 113, and a drive thru/displaymanager 114. The non-transitory computer-readable storage medium alsoincludes order images for items on a menu. The executable instructionsare executed by the processor from the non-transitory computer-readablestorage medium to perform the processing discussed below for thetransaction manager 111, the voice interface 113, and the drive thruterminal/display 114.

The drive thru terminal/display 120 includes at least one processor,memory, and a non-transitory computer-readable storage medium havingexecutable instructions representing a drive thru voice interface 128and a server/POS terminal agent 129. The drive thru terminal/display 120also includes a display 121, a microphone 122, a speaker 120, acamera/sensor 124, optionally a card reader 120, optionally a currencydispenser/accepter (D/A) 126, and optionally a wireless transceiver 127.The processor executes the executable instructions from thenon-transitory computer-readable storage medium to perform theprocessing discussed below with respect to the drive thru voiceinterface 128 and the server/POS terminal agent 129.

In an embodiment, the system 100 includes a network-enabled networkdevice 130 that includes a processor, memory, and non-transitorycomputer-readable storage medium that includes a POS terminal service131.

In an embodiment, the system 100 includes a mobile device 150 carried oroperated by a user in the vehicle 140. The mobile device 150 includesexecutable instructions, memory, and a non-transitory computer-readablestorage medium having executable instructions representing a mobileapplication 151. The processor executes the executable instructions toperform the processing discussed below with respect to the mobileapplication 151.

During operation of the system 100, a user in the vehicle while in frontof the drive thru terminal/display 120 uses spoken natural languagewords and phrases to place an order from a menu associated with arestaurant/fast food establishment.

In an embodiment, the camera/sensor 124 detects a vehicle in front ofthe drive thru terminal/display 120 and initiates the drive thru voiceinterface 128.

In an embodiment, the camera/sensor 124 detects a vehicle in front ofthe drive thru terminal/display 120 and processes the server/POSterminal agent 129 through a wired or wireless connection 115 to theserver/POS terminal 110. The voice interface 113 is then initiated onthe server/POS terminal 110 providing audio feedback to the speaker 123and receive speech input from the user through the microphone 122.

In an embodiment, the camera/sensor 124 detects a vehicle in front ofthe drive thru terminal/display 120 and processes the drive thru voiceinterface 128 to redirect speech input from the user received over themicrophone 122 to the voice-enabled network device 135 over a wired orwireless connection 135. The voice-enabled network device 135 initiatesthe POS terminal service 131 for a dialogue with the user during theordering and audio/speech-based feedback produced by the POS terminalservice 131 is routed over connection 135 to be played over the speaker123.

In an embodiment, the drive thru voice interface 128, the voiceinterface 113, and/or the POS terminal service 131 is initiated based ona plurality of predefined wake-up words or phrases, such as: “Hello,” “Iwould like,” “can I have,” “can I get,” and others that are typicallyused when starting an order. In an embodiment, the wake-up words orphrases may be presented on the display 121 for the user to see when atthe drive thru terminal/display 120. An automated voice response fromthe drive thru voice interface 128 or the voice interface 113 mayinclude: “Please tell me what you would like to order,” “How can I serveyou today,” “Ready to take/receive your order,” etc.

So, the drive thru voice interface 128, the voice interface 113, and/orthe POS terminal service 131: can be initiated for an order based onautomatic discovery of the vehicle 140 at the drive thruterminal/display 120 (through image processing) of images captured bythe camera 124; can be initiated for an order based on automaticdetection of the vehicle 140 through a sensor 124 (such as a light-basedor motion-based sensor 124) that identifies the vehicle 140 at the drivethru terminal/display 120; and/or can be initiated for an order based onlistening over the microphone 122 for predefined wake-up words orphrases.

In an embodiment, the voice-interface 128, the voice interface 113and/or the POS terminal service 131 are automatically initiated fortaking an interactive speech-based order with the user based on amachine-learning algorithm that is trained on images to determine whenthe vehicle 140 is present at the drive thru terminal/display 120.

Once the drive thru voice interface 128, the voice interface 113, and/orthe POS terminal service 131 are initiated (either through automatedvehicle 140 discovery or predefined wake up words or phrases). Aninteractive speech dialogue is initiated and the transaction manager 111provides a transaction identifier for an order transaction. An initialgreeting may be played over the speaker 123 to the customer and/orreproduced as text presented on the display. The transaction identifierstarts a session or an interactive ordering dialogue for the user tocommunicate through speech an order and user-specific order details.

The voice interface 113, the drive thru voice interface 128, and/or thePOS terminal service 131 provides a real-time speech-based interface forthe user to verbally communicate an order. A dialogue is created duringwhich speech from the user received over the microphone is converted totext (speech to text). The text is recognized and processed as commandsand selections received from the user by the transaction manager 111.Output text that confirms order selections and order details for anorder produced by the transaction manager 111 is processed by the voiceinterface 113, the drive thru voice interface 128, and/or the POSterminal service 131 to convey speech feedback for the order or speechinstructions/guidance, which is played over the speaker 123 (text tospeech). The feedback and instructions may also be concurrentlypresented on the display 121 by the server/POS terminal agent 129.

The lexicon of words and phrases are recognized and processed forspeech-to-text, transaction processing of an order, and text-to-speechfeedback are predefined and constrained to the menu and menu optionsavailable at the restaurant. This substantially improves accuracy. Thepredefined lexicon can include order-specific words and phrases as well,such words may include commands for: “want,” “like,” “make,” “add,”“order,” “buy,” “purchase,” “cancel,” “delete,” “remove,” “modify,”“change,” etc. The commands are identified as processing actions thatare to be processed by the transaction manager 111 for a givenorder/transaction/session/dialogue. The supported nouns associated withcommands vary based on the restaurant (fast food) type and the food menuof available food items. The nouns identity menu items associated with agiven command. For example, a hamburger fast food restaurant may includenouns for: hamburger, cheeseburger, lettuce, tomato, onion, pickle,ketchup, mustard, mayo, fries, onion rings, soda (such as Coca-Cola®,Pepsi®, Sprite®, root beer, etc.), water, ice tea, coffee, etc. Theadjectives can also be customized for the specific restaurant and itsmenu and menu options, such adjectives may include: small, medium,regular, large, extra-large, double, triple, Super-Size®, etc. Theadjectives define characteristics of a given noun or set of nouns (suchas a prepackaged meal). Prepositions may include: with, and without forincluding or not including something with a noun (menu item). Theexclamations used for confirming an order is correct through speechfeedback may include: yes, no, or variants such as, yea, nah, yep, nope,wrong, right, correct, incorrect.

In an embodiment, the supported lexicon is specific to a givenrestaurant type and provided through an added skill to an existingvoice-enabled network service as a specialized skill represented by thePOS terminal service 131. The existing voice-enabled network service caninclude by way of example, Siri®, Amazon Echo/Alexa®, Google Home®, etc.In this embodiment, the voice-enabled network device 130 is a modifiedversion of one of these consumer-voice services that includes thespecialized POS terminal service 131 as an enhanced skill for theexisting voice-enabled network service. The POS terminal service 131 iscoded for the specialized lexicon/dictionary and commands as discussedabove and operates through the voice-enabled network device 130.

Words communicated through speech are received through the microphone122 and translated to text by the voice interface 113, the drive thruvoice interface 128, and/or the POS terminal service 131. Sentences thenbecome text words that are recognized as parts of speech for thecommands, the adjectives, the noun menu items, and prepositions, andexclamations. The components of the sentences are formatted as commandsprovided to an Application Programming Interface (API) of thetransaction manager 111, the transaction manager 121 is processed as acomponent of the restaurant's POS terminal ordering system/interfaces.Output produced by the transaction manager 111 is translated to speechand played over the speaker 123 and/or presented on the display 121during the order by the voice interface 113, the drive thru voiceinterface 128, and/or the POS terminal service 131. Any visual presentedfeedback or guidance during the transaction/dialogue is processed by theserver POS terminal agent 129 to render such visual information on thedisplay 121. The speech-based feedback and guidance during thetransaction/dialogue is processed by the voice interface 113 the drivethru voice interface 128, and/or the POS terminal service 131 and playedas audio over the speaker 123.

The order/transaction/dialogue is a real-time dialogue and session for atransaction and the transaction is for placing an order with therestaurant. The user provides natural language voice instructions andreceives natural language speech feedback, speech confirmation, andspeech-based help instructions or guidance as needed during the sessionfor the transaction.

In an embodiment, the server/POS terminal agent 129 utilizes acollection of menu item images 113 that map to menu items for the menuof the restaurant. So, when a user orders through speech an item andimage of that item is retrieved by the server/POS terminal agent 129 andpresented on the display 121. The user can visually see what the userhad communicated through speech as the items are ordered through speechby the user. This may be particularly helpful when a user sees that animage for an item ordered is not something that the user actually wanted(in other words the user may have said something different from what theuser actually wanted). This allows for speech-based revisions andchanges during the order. In addition, text associated with the ordermay be presented with the corresponding images.

In an embodiment, when the user initially starts an order with a wake upword, the voice interface 113, the drive thru voice interface 128,and/or the POS terminal service 131 may detect the language being spokenby the user and a dialect of a specific language. The voice interface113, the drive thru voice interface 128, and/or the POS terminal service131, and the server/POS terminal agent 129 then configure its speech andtext provided as spoken and visual feedback and guidance for thedetected language and dialect. So, any supported spoken language can beautomatically recognized and used throughout theorder/session/dialogue/transaction.

In an embodiment, if the voice interface 113, the drive thru voiceinterface 128, and/or the POS terminal service 131 may also uponcompletion of an order (with an exclamation being verified from the userfor the order) provide speech instructions for the user to pay for theorder. The order may be paid for at the drive thru terminal/display 120or at a different station. In an embodiment, where the order is paid forat the drive thru terminal/display 120, the drive thru terminal/display120 includes a card reader 125 and/or a currency dispenser/accepter 126.Other forms of payment may be used as well, such as Apple Pay®, NCR'sJetPay®, Venmo®, digital wallets, etc. Control is passed to thetransaction manager 121 and these peripherals for confirming payment wasreceived from the user either through a credit/debit card or throughcurrency.

In an embodiment, when the voice interface 113, the drive thru voiceinterface 128, and/or the POS terminal 131 is unable to understand ortranslated the spoken words of the user during the order. The server/POSterminal agent 129 may initiate a voice call to a remote human orderingagent. This call can be initiated through cellular or through Voice overInternet Protocol (VoIP) from the drive thru terminal/display 120 to theremote order taker or a bank of order takers to select. The server/POSterminal agent 129 may then be remote controlled through an API by theorder taker during the order, and selections or details for the orderprovided by the server/POS terminal agent 129 to the transaction manager111 for processing with the local restaurant's orderingsystem/interface. In an embodiment, the call request is routed overconnection 115 to the server/POS terminal 110 and the server/POSterminal 110 makes the call and provides the connection between theorder taker and the user through the drive thru terminal/display 120utilizing the display 121 and the microphone 122.

In an embodiment, the server/POS terminal agent 129 interacts with thewireless transceiver 127 to establish a connection to the mobile device150 and communicate with the mobile application 151. In this embodiment,the user may conduct the voice-based order dialogue with the drive thruterminal/display 120 using the microphone of the mobile device 150 whileremaining in the vehicle 140 with the windows of the vehicle 140remaining up. Here, the voice-based feedback and guidance for theorder/dialogue/transaction can be played over a speaker of the mobiledevice 150, over speaker 123 of the drive thru terminal/display 120 orover both the mobile device's speaker and the drive thruterminal/display's speaker. This may be particularly advantageous whenthere is inclement weather at the location associated with the drivethru terminal/display 120. Here, spoken user voice during the dialogueis provided over a wireless connection 145 (such as Bluetooth®) andfeedback from the voice interface 113, the drive thru voice interface128, and/or the POS terminal service 131 routed over the wirelessconnection 145 and played over a speaker that is integrated into themobile device. Additionally, in this embodiment, the mobile application151 may be used to display the visual feedback and guidance providedover the wireless connection 145 on a display of the mobile device 150.In some cases, the mobile application 151 may provide selectable andchangeable options for the order that the user can make through aninterface of the mobile application 151. So, the input microphone can bemicrophone 122 or a mobile device microphone, while the output speakercan be speaker 123 or a mobile device speaker. Furthermore, a text-basedmenu interface may be interacted with by the user through a mobileapplication user interface on the mobile device 150.

In an embodiment, the microphone 122 input source feed is routed overconnection 135 to the voice-enabled network device 130 and the speakeroutput source feed of the voice-enabled network device 130 is routedover 135 to the speaker 123. Connection 135 also provides thespeech-to-text translations made by the POS terminal service 131 to thedrive thru voice interface 128. The drive thru voice interface 128 thenuses the text transactions to translate into transaction manager 111commands and provides to the server/POS terminal agent 129. The terminalagent 129 interacts with the transaction manager 111 through an API tohave the commands for the order and the order details recorded for thetransaction in the ordering system/interface. Output received from thetransaction manager 111 is received by the terminal agent 129 andprovided to the drive thru voice interface 128, the voice feedback isforwarded to the POS terminal service 131 and played over speaker 123 tothe user. The terminal agent 129 also presents the feedback in visualformat on the display 121, the visual format may include the images foritems of the menu that were ordered (such as an image of a cheeseburger, etc.) and text formatted feedback.

In an embodiment, the mobile application 151 makes a wireless connection146 to the POS terminal service 131 and provides voice-based orderingthrough the microphone of the mobile device 150. Speech and voiced-basedfeedback is communicated back to the speakers of the mobile device 150and/or also the speaker 123. Here, the POS terminal service 131 providesthe voice input to the terminal agent 129 and plays the feedback for theorder over the mobile device's speakers using connection 146.Separately, the POS terminal service 131 interacts with the drive thruvoice interface 128 over a different connection 135 to permit theterminal agent 129 and the transaction manager 111 to process the orderfor the transaction in the manners discussed above. In this embodiment,the mobile application 151 reports its current geographical location andthe POS terminal service 131 looks up an identifier for theterminal/display 120 based on the reported geographical location of themobile device 150. This permits the POS terminal service 131 to makeconnection 135 with the appropriate terminal/display 120 that thevehicle 140 is located at.

In an embodiment, the voice-enabled network device 130 resides on server110 or is connected to POS terminal 110 rather than connected to thedrive thru terminal/display 120. In this embodiment the voice interface113 manages voice to text and text to voice translations for interactionwith the terminal agent 129 and/or transaction manager 111.

In an embodiment, the drive thru terminal/display 120 is a thin client,which includes just the display 121, the microphone 122, the speaker123, optionally the camera/sensor 124, optionally the card reader 125,optionally the currency dispenser/accepter 126, optionally the wirelesstransceiver 127, and the terminal agent 129. In this embodiment, theterminal agent 129 provides speech received over 115 to the server/POSterminal 110 and all processing for the transaction/order/dialogue isprocessed on the server/POS terminal 110 with the drive through displaymanager 114 displaying the menu item images 112 on display 121 and thevoice interface 113 playing the speech feedback over speaker 123.

The system 100 provides a variety of configurations for integrated aseparate intelligent voice service with an existing orderingsystem/interface for purposes of allowing natural language basedordering at the drive thru terminal/display 120. The system 100 includesa variety of embodiments and additional order processing. The system 100is customized for a specific restaurant type and specific menu of such arestaurant type, which allows for more accurate speech-to-text andtext-to-speech processing. This is done with a restricted lexicon ofwords and phrases to identify user order commands, menu items (nouns),adjectives affecting the menu items, prepositions affecting menu items,and exclamations that confirm or do not confirm a given user order.

In an embodiment, the server/POS terminal 110 is a server 110. In anembodiment, the server 110 is a collection of servers that logicallycooperate in a cloud processing environment. In an embodiment, theserver 110 is connected over a Local Area Network (LAN) via 115 to theterminal/display 120. In an embodiment, the server 110 is connected overa Wide Area Network (WAN) through 115 to the terminal/display 120.

In an embodiment the server/POS terminal 110 is a POS terminal 110. ThePOS terminal 110 is a transaction terminal located within therestaurant. In an embodiment, the POS terminal 110 can be a Self-ServiceTerminal (SST) 110 located within the restaurant and remote from theterminal/display 120, and the SST 100 also permits other customers orusers that come within the restaurant to concurrently place self-orderson touchscreen interface of the SST 100. In an embodiment, the POSterminal 110 is a cashier assisted transaction terminal within therestaurant that concurrently takes and processes orders forcustomers/users that come inside to eat at the restaurant. In anembodiment, the POS terminal 110 is connected to the terminal/display120 over wireless connection 115. In an embodiment, the POS terminal isconnected to the terminal/display over a wired connection 115.

In an embodiment, a separate SST includes the card reader 125 and thecurrency dispenser/accepter 126. The user engages in a completedvoice-based dialogue order at the terminal/display 120 and pulls thevehicle ahead to the SST where the user is instructed to pay for theorder to complete the transaction.

In an embodiment, the user engages in a completed voice-based dialogueorder at the terminal/display 120 and pulls up to a window where theuser pays for the order to complete the transaction.

In an embodiment, the user engages in a completed voice dialogue andpays for the order to complete the transaction at the terminal/display120 utilizing card reader 125 and/or the currency dispenser/accepter126. The user then pulls ahead and picks up the order at a window from ahuman agent or the user pulls ahead to a designated window or foodstorage bin that automatically unlocks when the order is completed. Anyautomated window/bin may not include a human agent; albeit a human agentmay fulfill the order and pace it in the located bin or inside thelocked window. In an embodiment, the order is fulfilled by automatedrobots and placed in a locked bin or inside the locked window.

The embodiments discussed above and other embodiments will not bediscussed with reference to the FIGS. 2-4 below.

FIG. 2 is a diagram of a method 200 for voice-based order processing,according to an example embodiment. The software module(s) thatimplements the method 200 is referred to as a “voice-order service.” Thevoice-order service is implemented as executable instructions programmedand residing within memory and/or a non-transitory computer-readable(processor-readable) storage medium and executed by one or moreprocessors of a device. The processor(s) of the device that executes thetransaction manager are specifically configured and programmed toprocess the voice-order service. The voice-order service has access toone or more networks during its processing. The networks can be wired,wireless, or a combination of wired and wireless.

In an embodiment, the device that executes the voice-order service isthe drive thru terminal/display 120.

In an embodiment, the device that executes the voice-order service theserver 110

In an embodiment, the device that executes the voice-order service isthe POS terminal 110.

In an embodiment, the device that executes the voice-order service isthe voice-enabled network device 130.

In an embodiment, a plurality of devices execute portions of thevoice-order service, the devices include all of some combination of: theserver/POS terminal 110, the drive thru terminal/display 120, thevoice-enabled network device 130, and/or the mobile device 150.

In an embodiment, the voice-order service is all or some combination of:the transaction manager 111, voice interface 113, drive thruterminal/display manager 114, drive thru voice interface 128, server/POSterminal agent 129, POS terminal service 131, and/or mobile application151.

At 210, the voice-order service initiates, at a drive thru terminal, aninteractive voice dialogue/session with a user for an order. The user issituated within a vehicle in front of the drive thru terminal. In anembodiment, the drive thru terminal is the terminal/display 120.

In an embodiment, at 211, the voice-order service automatically detectsa vehicle with includes the user in front of the drive thru terminal.

In an embodiment of 211 at 212, the voice-order service processes imagestaken in a front of the drive thru terminal with a machine-learningalgorithm to identify a presence of the vehicle.

In an embodiment of 212 at 213, the voice order service identifies thevehicle through a motion or light sensor that detects the vehicle infront of the drive thru terminal.

In an embodiment, at 214, the voice-order service identifies a wake-upword or phrase received over a microphone of the drive thru terminalthat is specific to ordering.

At 220, the voice-order service translates speech provided by the userinto text commands associated with an order interface using a lexiconassociated with a type of restaurant.

In an embodiment, at 221, the voice-order service requests clarificationto a portion of the speech through a speech-clarification request playedover the speaker to the user.

In an embodiment, at 222, the voice-order service summarizes the speechonce translated and playing a summary of the speech over the speaker tothe user.

At 230, the voice-order service processes text commands through theorder interface.

In an embodiment, at 231, the voice-order service receives order pricingand order details from the order interface as text information.

At 240, the voice-order service provides, at the drive thru terminal,speech feedback and guidance to the user during the order over thespeaker.

In an embodiment of 231 and 240, at 241, the voice-order servicetranslates the text information into the speech feedback that is playedover the speaker.

In an embodiment of 241 at 242, the voice-order service simultaneouslypresents the text information on a display of the drive thru terminal asthe speech feedback is played over the speaker to the user.

In an embodiment of 242 at 243, the voice-order service presents imagesassociated with the order items of the order details on the display withthe text information. Each image is a customized picture or illustrationof a given order item.

At 250, the voice-order service completes, at the drive thru terminal,the order for subsequent order fulfillment processing.

In an embodiment, at 260, the voice-order service processes the speechand the speech feedback through a voice-enabled network deviceinterfaced to the drive thru terminal using an enabled skill on thevoice-enabled network device for the type of restaurant. In anembodiment, the voice-enabled network device is an enhanced version ofSiri®, Echo®, or Google Home® having the novel and enabled skill.

FIG. 3 is a diagram of another method 300 for voice-based orderprocessing, according to an example embodiment. The software module(s)that implement the method 300 is referred to herein as a “drive thruvoice-enabled service.” The drive thru voice-enabled service isimplemented as executable instructions and programmed within memoryand/or a non-transitory computer-readable (processor-readable) storagemedium that executes on one or more processors of a device. Theprocessors of the device are specifically configured to execute thedrive thru voice-enabled service. The drive thru voice-enabled servicehas access one or more networks; the networks are wireless.

In an embodiment, the drive thru voice-enabled service is all or somecombination of: transaction manager 111, voice interface 113, drive thruterminal/display manager 114, drive thru voice interface 128, server/POSterminal agent 129, POS terminal service 131, and/or the mobileapplication 151.

In an embodiment, the device that executes the drive thru voice-enabledservice is server 110.

In an embodiment, the device that executes the drive thru voice-enabledservice is the POS terminal 110.

In an embodiment, the device that executes the drive thru voice-enabledservice is the drive thru terminal/display 120.

In an embodiment, the device that executes the drive thru voice-enabledservice is a combination of all of the devices 110, 120, 130 and/or 150.

At 310, the drive thru voice-enabled service automatically initiates avoice dialogue for a transaction associated with an order of a user in avehicle at a drive thru terminal based on detection of an event.

In an embodiment, at 311, the drive thru voice-enabled service detectsthe event as an indication of a vehicle that includes the user being infront of the drive thru terminal using a camera or a motion/lightsensor.

In an embodiment, at 312, the drive thru voice-enabled service detectsthe event as a predefined wake-up word that initiates the order whendetected over the microphone.

At 320, the drive thru voice-enabled service translates speech receivedover a microphone from a user into text commands, text order items, andtext order customizations.

At 330, the drive thru voice-enabled service formats the text commands,text order items, and text order customizations into an order interfaceactions.

At 340, the drive thru voice-enabled service provides the orderinterface actions to a transaction manager using an ApplicationProgramming Interface (API).

In an embodiment, at 341, the drive thru voice-enabled service obtains atransaction number for the transaction and the order from thetransaction manager.

At 350, the drive thru voice-enabled service receives order pricing andorder details over the API from the transaction manager.

At 360, the drive thru voice-enabled service converts the order pricingand the order details into speech feedback.

At 370, the drive thru voice-enabled service plays the speech feedbackover a speaker to the user.

In an embodiment, at 371, the drive thru voice-enabled service obtainsthe order pricing and the order details on a display of the drive thruterminal.

At 380, the drive thru voice-enabled service iterates back to 320 untilspeech confirmation is received from the user indicating the voicedialogue has concluded.

At 390, the drive thru voice-enabled service provides instructions tothe user for making payment to conclude the transaction.

In an embodiment, at 391, the drive thru voice-enabled service receivesa credit card or currency by the drive thru terminal from the user tocomplete the transaction. It is to be noted that other non-card basedforms of payment may also be processed, such as but not limited to:Apple Pay®, NCR JetPay®, Venmo®, digital wallets, virtual currency froma virtual currency wallet, etc.

FIG. 4 is a diagram of a system 400 for voice-based order processing,according to an example embodiment. Some components of the system 400are programmed and reside within memory and/or a non-transitorycomputer-readable medium and execute on one or more processors ofdevices of the system 400. The system 400 also has access and cancommunicate over one or more networks; and the networks are wireless,wired, or a combination of wired and wireless.

The system 400 is configured and programed to perform the processingdiscussed above with the FIGS. 1-3.

The system 400 includes a display 401, a microphone 402, a speaker 403,a processor 404, and non-transitory computer-readable storage medium 405having executable instructions representing a voice-based orderingservice 406.

In an embodiment, the processor is included within one of: a server 110,a POS terminal 110, a SST 110, terminal/display 120, or voice-enablednetwork device 130.

In an embodiment, the voice-based ordering service 406 is all or somecombination of one or more of: the transaction manager 111, the voiceinterface 113, the drive-thru terminal display manager 114, the drivethru voice interface 128, the terminal agent 129, the POS terminalservice 131, the mobile application 151, the method 200, and/or themethod 300.

When the processor 404 loads the executable instructions for thevoice-based ordering service 406 from the non-transitorycomputer-readable storage medium 405 and executes the voice-basedordering service 406 this causes the processor 404 to: engage a userwhile in a vehicle at a drive thru terminal 120 in an interactive voicedialogue for taking an order at a restaurant; translate speech receivedover the microphone 402 during the interactive voice dialogue into textactions; process the text actions with an order interface; receive orderpricing and order details from the order interface responsive toprocessing the text actions; communicate through speech feedback theorder pricing and order details over the speaker 403; display the orderpricing and order details on the display 401; confirm through speechinteraction that the order is complete; and request a payment methodfrom the user to complete the order for order fulfillment processing.

In an embodiment the voice-based ordering system 406 is furtherconfigured when executed by the processor to cause the processor 404 to:display unique images representing order items on the display 401 duringthe interactive voice dialogue.

It should be appreciated that where software is described in aparticular form (such as a component or module) this is merely to aidunderstanding and is not intended to limit how software that implementsthose functions may be architected or structured. For example, modulesmay be illustrated as separate modules, but may be implemented ashomogenous code, as individual components, some, but not all of thesemodules may be combined, or the functions may be implemented in softwarestructured in any other convenient manner.

Furthermore, although the software modules are illustrated as executingon one piece of hardware, the software may be distributed over multipleprocessors of a single device, or in any other convenient manner.

The above description is illustrative, and not restrictive. Many otherembodiments will be apparent to those of skill in the art upon reviewingthe above description. The scope of embodiments should therefore bedetermined with reference to the appended claims, along with the fullscope of equivalents to which such claims are entitled.

In the foregoing description of the embodiments, various features aregrouped together in a single embodiment for the purpose of streamliningthe disclosure. This method of disclosure is not to be interpreted asreflecting that the claimed embodiments have more features than areexpressly recited in each claim. Rather, as the following claimsreflect, inventive subject matter lies in less than all features of asingle disclosed embodiment. Thus the following claims are herebyincorporated into the Description of the Embodiments, with each claimstanding on its own as a separate exemplary embodiment.

1. (canceled)
 2. A method, comprising: detecting a user in a vehicle ata drive thru terminal by image processing, by sensor-raised events, orby user-provided speech to a microphone; greeting the user by automatedspeech played over a speaker and reproduced as text presented on adisplay of the drive thru terminal; engaging the user in a naturallanguage dialogue to start a session for a transaction with the userusing a lexicon for automated speech processing that is specific to amenu and menu options associated with the drive thru terminal; receivingan order from the user during the session based on the dialogue;providing the order to order fulfillment processing for fulfillment; andproviding instructions to the user during the dialogue using theautomated speech processing for the user to provide a payment tocomplete the transaction, the session, and the dialogue.
 3. The methodof claim 2, wherein detecting further includes identifying a wake-upword or a wake-up phrase in the user-provided speech to the microphone.4. The method of claim 2, wherein detecting further includes detecting avehicle adjacent to or in front of the drive thru terminal in an imagecaptured by a camera associated with the drive thru terminal.
 5. Themethod of claim 2, wherein engaging further includes translating theuser-provided speech during the dialogue into feedback text andpresenting the feedback text on the display to confirm with the user theuser-provided speech provided as responses during the dialogue.
 6. Themethod of claim 5, wherein translating further includes translatingautomated speech for the automated speech processing generated duringthe dialogue into automated text and presenting the automated text onthe display to ensure the user hears the automated speech and can alsoread the automated speech during the dialogue.
 7. The method of claim 2,wherein receiving further include obtaining menu item images as spokenby the user during the dialogue and presenting the menu item images onthe display as visual feedback to the user of a given menu item orderedwith the order by the user.
 8. The method of claim 2, wherein receivingfurther includes initiating a remote voice call to an agent when theuser-provided speech is unable to be translated for completing the orderduring the dialogue.
 9. The method of claim 8, wherein initiatingfurther includes receiving the order details for the order from theagent.
 10. The method of claim 2 further comprising, receiving theuser-provided speech during the dialogue from a mobile device operatedby the user.
 11. The method of claim 10 further comprising, providingthe automated speech generated during the dialogue to a speaker of themobile.
 12. The method of claim 2 further comprising, processing themethod on a server remotely located from the drive thru terminal orprocessing the method on a Point-Of-Sale (POS) terminal located at asame establishment that the drive thru terminal is located.
 13. Amethod, comprising: automatically greeting a user detected in a vehicleadjacent to a drive thru terminal with a spoken automatically generatedgreeting; obtaining a transaction identifier for a transaction from aPoint-Of-Sale (POS) terminal; configuring automated speech processingbased on a lexicon associated with a menu and menu items available fromthe drive thru terminal; initiating a session with the user using thetransaction identifier; engaging the user during the session in anatural language dialogue to receive order details from the user for anorder the user-provided speech and automated generated speech using theautomated speech processing; instructing the user during the dialogue onwhere and how a payment for the order can be supplied to complete theorder; and submitting the order details with the transaction identifierto the POS terminal for order fulfillment and completion of thetransaction.
 14. The method of claim 13, wherein engaging furtherincludes obtaining menu item images as spoken by the user during thedialogue from the POS terminal and displaying the menu item images asvisual feedback to the user on a display of the drive thru terminal. 15.The method of claim 13, wherein engaging further includes translatingthe user-provided speech to user-identified text, translating theautomated generated speech to feedback text, and presenting theuser-identified text and the feedback text on a display of the drivethru terminal during the session as visual feedback to the user.
 16. Themethod of claim 13, wherein instructing further includes using theautomated generated speech to instruct the user to pull a vehicle of theuser ahead to a next location where the payment for the order will becollected from the user.
 17. The method of claim 13, wherein instructingfurther includes activating a card reader at the drive thru terminal toreceive a payment card for the payment.
 18. The method of claim 13,wherein instructing further includes instructing the user to pull aheadto a next location to receive items associated with the order afterprocessing the payment.
 19. The method of claim 13, wherein instructingfurther includes instructing the user to pull ahead to a next locationto receive the items associated with the order from a locked bin thatautomatically unlocks when the items are placed in the bin after theorder is fulfilled for collection by the user and after processing thepayment.
 20. A system, comprising: a drive thru terminal comprising adisplay, a microphone, and a camera; a Point-Of-Sale (POS) terminalinterfaced to an order fulfillment system; a server comprising aprocessor and a non-transitory computer-readable storage medium; thenon-transitory computer-readable storage medium comprises executableinstruction; the executable instructions when executed by the processorfrom the non-transitory computer-readable storage medium cause theprocessor to perform operations comprising: automatically initiating avoice-based dialogue with a user when a vehicle of the user is detectedadjacent to the drive thru terminal based on images provided by thecamera or when the user speaks a wake-up word or phrase into themicrophone; generating automated speech responses to user-providedspeech during the voice-based dialogue; obtaining order details for anorder being placed by the user based on the user-provided speech duringthe voice-based dialogue; providing the order details to the POSterminal for order fulfillment; and instructing the user on how andwhere to provide a payment to pay for the order and complete the order.21. The system of claim 20, wherein the server is a cloud processingenvironment and the drive thru terminal is a thin client device.